The Curse of Dimensionality in Data Mining and Time Series Prediction
نویسندگان
چکیده
Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.
منابع مشابه
Modeling and prediction of time-series of monthly copper prices
One of the main tasks to analyze and design a mining system is predicting the behavior exhibited by prices in the future. In this paper, the applications of different prediction methods are evaluated in econometrics and financial management fields, such as ARIMA, TGARCH, and stochastic differential equations, for the time-series of monthly copper prices. Moreover, the performance of these metho...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملTime-Series Classification in Many Intrinsic Dimensions
In the context of many data mining tasks, high dimensionality was shown to be able to pose significant problems, commonly referred to as different aspects of the curse of dimensionality. In this paper, we investigate in the time-series domain one aspect of the dimensionality curse called hubness, which refers to the tendency of some instances in a data set to become hubs by being included in un...
متن کاملFeature Selection for Genomic and Proteomic Data Mining
The extreme dimensionality (also known as the curse of dimensionality) in genomic data has been traditionally a serious concern inmany applications. This hasmotivated a lot of research in feature representation and selection, both aiming at reducing dimensionality of features to facilitate training and prediction of genomic data. In this chapter,N denotes the number of training data samples,M t...
متن کاملRisk prediction based on a time series case study: Tazareh coal mine
In this work, the time series modeling was used to predict the Tazareh coal mine risks. For this purpose, initially, a monthly analysis of the risk constituents including frequency index and incidence severity index was performed. Next, a monthly time series diagram related to each one of these indices was for a nine year period of time from 2005 to 2013. After extrusion of the trend, seasonali...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005